-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8370176: Mixed mode jhsdb jstack cannot unwind call stack with -Xcomp #27885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
👋 Welcome back ysuenaga! A progress list of the required criteria for merging this PR into |
|
@YaSuenag This change is no longer ready for integration - check the PR body for details. |
|
/issue JDK-8370176 |
|
@YaSuenag The primary solved issue for a PR is set through the PR title. Since the current title does not contain an issue reference, it will now be updated. |
Webrevs
|
| * @test | ||
| * @bug 8370176 | ||
| * @requires vm.hasSA | ||
| * @requires os.family == "linux" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do Windows and OSX have a similar problem that should be fixed also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This problem is in mixed mode (PStack) only, thus we need to skip OSX because you mentioned mixed mode is not supported on OSX.
In Windows, I'm not sure, but I guess we need to consider UNWIND_INFO to unwind call frames correctly like DWARF in Linux, however it hasn't done yet. Thus we can think mixed mode is not supported in Windows too, so I didn't add Windows here.
https://learn.microsoft.com/cpp/build/exception-handling-x64
Actually I could not see all of stacks as following in mixed mode. It works in normal mode (without --mixed) of course. (I tested it on Windows 11 x64, upstream JDK built by VS 2022)
----------------- 13 -----------------
"Reference Handler" #15 daemon prio=10 tid=0x00000207280b9f70 nid=12684 waiting on condition [0x000000aaf6aff000]
java.lang.Thread.State: RUNNABLE
JavaThread state: _thread_blocked
0x00007fffa6b45844 ntdll!NtWaitForAlertByThreadId + 0x14
0x00000000ffffffff ????????
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind you that this new test seems to fail even on linux systems without pstack. This is happening on both of my AMD64 machine running Debian 12 and ARM64 machine running Ubuntu 22.04.4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share .jtr file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. This is what I got on my amd64 machine:
$ make test TEST="serviceability/sa/TestJhsdbJstackMixedWithXComp.java"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RealFYang Thank you for sharing it!
I think it might be caused by binary difference, it is not caused by this PR at least. So I think we can go forward this PR, make sence?
Your .jtr file implies stack unwinding was failed from the function by libc in following:
----------------- 2310034 -----------------
"SteadyStateThread" #39 prio=5 tid=0x00007fd2600358a0 nid=2310034 waiting for monitor entry [0x00007fd2351f4000]
java.lang.Thread.State: BLOCKED (on object monitor)
JavaThread state: _thread_blocked
0x00007fd267930f16 __futex_abstimed_wait_common + 0xc6
----------------- 2310033 -----------------
"ForkJoinPool-1-worker-2" #38 daemon prio=5 tid=0x00007fd1ec006600 nid=2310033 runnable [0x00007fd2352f5000]
java.lang.Thread.State: RUNNABLE
JavaThread state: _thread_in_native
0x00007fd26797a545 __clock_nanosleep + 0x65
0x00007fd26797ee53 __GI___nanosleep + 0x13
Native stack unwinding on Linux AMD64 depends on DWARF (in AArch64, it depends on FP (x29) yet).
I downloaded and checked libc.so.6 in libc6-udeb_2.41-12_amd64.udeb, it has .eh_frame section which would be used by DwarfParser, but it does not have any symbols, and not have .gnu_debuglink ELF section. OTOH Fedora 43 which I confirmed to work has both symbols and .gnu_debuglink.
They are used for symbol resolution, not stack unwinding. However other difference(s) in binary might affect statck unwinding. Thus I think it is not a problem caused by this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I am just wondering if there is a workaround for these platforms. Or can we simply skip this when testing on them? Say, if this depends on the availability of pstack, maybe we can add check for that then. Otherwise, we may introduce test noise for people who use them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could reproduce the problem not only Ubuntu 22.04 but also 23.04 . However it did not happen on Ubuntu 24.04 .
According to your report, the problem would happen on AArch64, it implies the problem is not in DWARF parser only. (DWARF parser is only available on Linux AMD64 so far)
AFAICS stack unwinding would fail from the function in glibc (on Ubuntu 22.04 and 23.04 at least), so I suspect something wrong in glibc binary and/or behavior and/or compiler options on Ubuntu. but I'm not sure.
I checked glibc version from gnu_get_libc_version(). "2.37" is returned on Ubuntu 23.04, and "2.39" is returned on Ubuntu 24.04 . So I think it can be gnu_get_libc_version() with FFM at first of the test, then the test is skipped if it runs on glibc 2.38 or earlier. Is it ok?
I grep'ed test directory with "mixed", I found another tests (TestJhsdbJstackMixed.java, TestJhsdbJstackPrintVMLocks.java). I will add glibc check to them as another ticket if this solution is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about the glibc version as I don't know much about the differences among these distributions.
But it works for me if you want to fix all the affected tests in another PR. Thanks for considering that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this PR to check glibc version in TestJhsdbJstackMixedWithXComp.java added by this PR. It skips the test on Ubuntu 22.04, OTOH it works on Fedora 43. It is expected.
I attempted to add this check to SATestUtils at first, but it seems to be difficult because native access have to be allowed all of SATestUtils users - the impact is too significant.
I will file another issue to apply this check to other tests of jhsdb jstack --mixed user after this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good. Thanks for fixing this.
|
@YaSuenag : |
|
@RealFYang Thanks a lot for sharing a patch for RISC-V! Merged to this PR. |
|
@plummercj Thanks a lot for your review! I'm trying to fix mixed mode on Windows. I think we can unwind native stacks with this change, but it is not enough for Java frames - I think we can see all of them if we modify after this PR. |
jhsdb jstack --mixedwould not work when attaches to the process runs with-Xcomp.It has been reported by @pchilano in #27728. You can reproduce the problem with Test.java (attached JBS). You can see following stack.
Thread.sleepNanos0is the bottom stack, but actually it has more call frames. You can see them with-XX:+PreserveFramePointer.Java frame might be use the register for frame pointer (
RBPin AMD64) as general purpose register, so SA cannot rely it in stack unwinding.hs_err log has mixed stack trace as "Native frames", it would be unwinded by
NativeStackPrinterin HotSpot, and it works as mixed mode.NativeStackPrinterusesframe::next_frame()to find sender frame regardless whether Java frame or C frame, and it leverages sender FP/PC to create sender frame. On the other hand, SA separates CFrame and VFrame to unwind in mixed mode jstack, so sender FP/PC would not propagate to CFrame, thus the frame located at bottom of Java frame might not be shown.It is difficult to unify unwinder in
PStackin SA, so it would be reasonable to propagate sender FP/PC to the sender of CFrame.Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27885/head:pull/27885$ git checkout pull/27885Update a local copy of the PR:
$ git checkout pull/27885$ git pull https://git.openjdk.org/jdk.git pull/27885/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 27885View PR using the GUI difftool:
$ git pr show -t 27885Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27885.diff
Using Webrev
Link to Webrev Comment